home *** CD-ROM | disk | FTP | other *** search
- Simple Compression using an LZ buffer
- Part 3 Revision 1.d:
- An introduction to compression on the Amiga by Adisak Pochanayon
-
- Freely Distributable as long as reproduced completely.
- Copyright 1993 Adisak Pochanayon
-
- -------------------------------------------------------------------------
- In Part 1 (now obsolete) and Part 2 of the Amiga Compression series,
- I covered a new and extremely fast RLE (run-length encoding) method for
- graphics.
-
- This article will delve further into the world of compression on
- the Amiga by covering my latest experiment in compression techniques and
- show the use of a simple LZ history compression algorithm.
-
- In the second part of this article, I describe a Compressor (and
- DeCompressor) program in C. It is not written for optimal speed merely
- because I wanted the code to be readable and the time that speed is
- really needed is decompression (which occurs in the super-fast assembly
- routine). Using this code along with the unpacker.asm will allow you
- to include any raw compressed data easily in your own programs.
-
- Important Notes and Disclaimers.
-
- NOTE 1:
-
- This is a "beta" document. I welcome your suggestions but before you
- employ the methods mentioned in this document, please try to realize
- that I hope to provide a completely new compressor within a week or
- two that should be an order of magnitude faster. I may end up
- slightly changing the compression technique as well to provide
- better compression. Also, I welcome suggestions to improve the
- "readability" of the article as well.*
-
- *Both of these features have been added. Hashing with collision
- detection in the compressor, the capability to alter the TAGs easily,
- and a new unpacking algorithm have been added.
-
- NOTE 2:
-
- A very common comment that I have received is "Why not use XPK or
- PowerPacker libraries?" I respond with several reasons. (I also
- removed a paragraph that some compression library programmers
- found insulting -- Please understand that compression libraries
- are good for the Amiga, but they are not always exactly what a
- programmer wants.)
-
- The first is simply that I wish to provide an alternative method.
- Although, I have been "urged" to let professionals handle compression,
- I would like to argue that any "amateur" can help the field. My
- compression algorithm has already consistently beat ZOO 2.1 and Arc 5
- for compression ratios and decompression from RAM: to RAM: is
- realistically more than twice as fast as any ZOO encoding. When
- SLZ is used within a program and compared to ZOO used on the RAM:
- filing system (an unfair test I know), SLZ beats ZOO by a speed
- factor of approximately 15. So if you feel that a technique that
- achieves better compression than the widely used ZOO and decompresses
- twice as fast is for amateurs, then you might be right. Then again,
- you might be wrong. I believe this is a viable "alternative" for
- Amiga programmers.
-
- The second involves distribution: PowerPacker can not be distributed with
- commercial products and it is possible that modules of XPK be likewise
- restricted at the discrimination of their authors. This algorithm
- is being released as Freely Distributable without restrictions to
- commercial applications although I retain the copyright. Also, many
- programmers would rather have direct control over their data and not
- go through the overhead of a library, regardless of how small that
- overhead is.
-
- Another factor, of course, is speed. SLZ has one of the fastest LZ
- unpacking algorithms available. This is in part due to the very
- little overhead imposed on the routine. The original asm encoding
- plus the enhancements made by Jesse Michael play no small role in
- the speed factor though. Combine that with the simple yet elegant
- data format and you get some awesome speed. To quote U.D. Mueller,
- "Good implementations of the algorithm used by ZOO are about
- an estimated half as fast as yours (check xpkBLZW)." So if you want
- TWICE the speed, drop ZOO -- go SLZ.
-
- The final reason is even if you decide not to use SLZ, I hope this
- article provides an educational benefit to Amiga programmers. I have
- certainly learned a lot by my experiments in data compression. There
- is so little "public" information on data compression available even if
- you read the compression newsgroups. I have also seen several postings
- on the c.s.a.p. newsgroup recently requesting further information on
- compression algorithms. With the large positive response I got on the
- RLE compression article earlier this year, I thought that an LZ
- article might also be received positively.
-
- NOTE 3:
-
- I have been told by several people that the compression algorithm I
- created is almost identical to the LZRW algorithm (partially patented).
- One person even sent me the specs of LZRW by e-mail. Indeed, the
- two algorithms are similar. However, there are some distinct
- differences. SLZ attains better compression than LZRW by using
- exhaustive history searching. SLZ also now uses a 16 collision
- HASHING for compression compared to the unit HASH used by LZRW.
- Furthermore, SLZ codes compressed strings slightly different, both
- with the END-TAGs, and the compressed string length. By doing so,
- SLZ again allows better compression (with a 4096-byte history,
- SLZ allows string lengths to vary between 3 and 17 while LZRW
- has a maximum of 15). Due to the data representation differences
- and the differences in the actual compression method, SLZ will
- *ALWAYS* obtain better compression ratios than LZRW. And yet another
- difference is the ability to alter the TAG in SLZ such that a
- 2048 History and 33-byte maxstring can be achieved. Finally, SLZ has
- a (currently) faster decompression algorithm than LZRW. Because of
- all these differences, I warrant that SLZ is a new algorithm that
- is simply in the same family as LZRW (After all I did say it is an
- LZ compression routine).
-
- I don't believe I have violated any patents with SLZ or broken any
- copyrights as I genuinely came up with this idea on my own. If
- I had access to other good compression source codes, I wouldn't have
- written SLZ to begin with. However, if I am violating any such
- ownership of intellectual property, I will remove SLZ from Freely
- Distributable status and I will ask anyone who has a copy of this
- article to destroy it and forbid my code from being used for any
- purposes on the Amiga.
-
-
- Comments, questions, spelling errors, or grammar police? Let me know :)
-
- Improvements from 3.0a:
- I have more than doubled the speed of compression on text and many
- other forms of data by modifying my HASH code to allow for up to
- 16 HASH collisions to be detected before brute force exhaustive
- searching occurs. Users who do not use the hash option will not
- notice the speedup in the C routines.
-
- I have also made the files more flexible to modify the TAG/CS
- bit-orderings. This will allow users to customize the file for
- greater compression by trading off length of compression strings
- for length of compression history. I have found that a history
- of 2048 instead of 4096 works much better for many graphics images
- which have lots of "line" oriented repeats. This allows me to
- compress longer repeat strings with a single CS tag.
-
- Many thanks to Jesse Michael of Haze of Epsilon (jdm@key880.rain.com),
- for rewriting the assembly. He unrolled my loops, freed a register,
- restructured my branching, optimized literal copying by making the
- speed of copying literal strings up to three times faster (on a 68000).
- Basically, he built a Porsche out of VW parts (joke - the original
- Porsches were built with VW parts :). The asm code is similar to
- the original (same excellent comments) and still easy to read. The
- asm takes a little more space, but the emphasis of this compressor
- has always been the speed of decompression.
-
- Thanks also go out to Urban Dominik Mueller (UDM) for pointing out some
- mistakes I made in my previous article and allowing me to correct them.
- Also, thank you for pointing out some obvious optimizations that I had
- not implemented such as unrolled loops.
-
- Finally, thanks to the 7-8 other people who responded with interesting
- suggestions and to the person who sent me the SPECs on the similar
- LZRW algorithm.
-
- Disclaimer:
-
- Neither Adisak Pochanayon, nor SilverFox SoftWare, warrant the use
- of this article for any means. The author holds no responsibility
- to the accuracy of the description of the routines or any adverse
- side effects they may cause when used on any machine.
-
- -------------------------------------------------------------------------
-
- Adisak Pochanayon
- 2525 University Avenue Apt #J
- Madison WI 53706
- 608-238-2463
- pochanay@cae.wisc.edu
-
- So you've plugged in your RLE compressor from Parts 1 and 2 and found
- that including fast compressed graphics is really easy to do in programs.
- But you want to include help text or other data that isn't easily compressed
- by RLE methods. You don't want to give up the incredible speed from straight
- RLE yet you want better compression.
-
- One simple solution might be to use one of the fine compression
- libraries available on the Amiga. However, this isn't always the best
- solution. Some libraries restrict commercial distribution, other libraries
- don't provide source code, and still others require you allocate complicated
- buffers and handles before using them. None of them guarantee to work
- if you have shut down the OS, multi-tasking or are using nonstandard
- programming methods (not that I would ever advocate OS shutdowns). Even
- libraries that are simple to use are not always tolerant to "braindead"
- installation. If a library is missing, or a wrong revision, your program
- won't run -- or worse, it could crash the machine.
-
- For a beginning or intermediate programmer, the "best" solution might
- be the simplest solution. Using a very simple, compact routine that can
- easily be linked or added to programs and used without worrying about
- opening libraries, looking for include files, allocating buffers, or any
- other overhead. However, to be viable, the routine would have to feature
- very fast decompression and have full available source for any custom
- modifications that a more experienced programmer might wish to make.
- Before deciding on an exact format for the compression, a programmer might
- want to examine the available algorithms.
-
- RLE, or run-length encoding is great for graphics where you have many
- repeating bytes of data. That's why it was chosen as part of the IFF
- standard. However, RLE's compression of this article would be feeble at best
- and the performance would be noticeably poorer than ZOO, ARC, or LHA.
- RLE's biggest advantage is it's extremely fast speed (Described in Part
- 2 of this series).
-
- A better method might be to reduce the "size" of codes for ASCII
- characters that appear often (e,r,o,s,t,l,etc.) and increase the "size" of
- codes that do not appear often (z,q,x,etc.). Think of it as making the size
- of codes for the ASCII characters related to their SCRABBLE (tm) scores.
- This type of compression, which is known as HUFFMAN codes, is much better at
- compressing text than RLE methods. Unfortunately, using huffman-coding or
- huffman variants can often be too slow for games or other programs that
- require extrememly fast decompression. HUFFMAN coding also requires creation
- and maintainance of structures to describe the codes that can become quite
- confusing. By varying the codes as one traverses through a file, the
- compression can be improved even more at a cost of greater file complexity.
- This is known as ADAPTIVE HUFFMAN compression.
-
- Another method of compression is what is often called LZ (Lempel-Ziv)
- compression. This involves a "history" of some sort representing the last
- X characters read and a pointer to a current character buffer that has
- yet to be compressed. The compression program looks for strings matching
- the current string in the history and replaces matches with a data TAG
- which describes the offset and length of the data in the history buffer.
- This is the type of compression that will be discussed in this article.
-
- Important note from (UDM): This actually describes the LZ77 group of LZ's.
- "There are other algorithms in the LZ family (like LZ78/LZW as used in
- 'compress') that work totally different."
-
- Slightly more complicated is a compression called LZH or LZ-Huffman
- compression. LZH (or LH) often outperforms LZ. This combines the LZ
- compression with HUFFMAN encoding methods. Adaptive LZH is even more
- complicated, but again can lead to better compression. Adaptive LZH is
- not always better as LhA uses static Huffman codes and usually outperforms
- LhArc which uses dynamic adaptive encoding (UDM). However, these methods
- of compression are outside the scope of this article which is directed
- mainly towards intermediate programmers.
-
- So which method is best? Well, I planned on including text, graphics,
- and other files in programs, with an emphasis on EXTREMELY FAST unpacking
- of compressed data. After examining all the methods available to me, I
- decided that the most flexible algorithm with enough speed would be a byte
- oriented LZ compression method. I decided to eliminate CRCs (checks on the
- integrity of the data) since the data would be "included" within a program
- and data integrity is most often a problem during telecommunications
- where a program would normally be transfered as part of a "LZH" or "ZOO"
- archive. I also eliminated bounds checking since from a programming
- perspective, I will always know the bounds of the data or I can include
- a simple structure to describe the size of the data without adding to the
- complexity of the compressed data itself.
-
- These ideas led to my creation of a simple LZ compression method that
- I call SLZ (version 0). After writing programs to implement my new
- compression technique, I found that compression was comparable to ZOO (v2.1)
- and ARC (v5.0). As a matter of fact, on the Fred Fish Library Contents-870,
- a 1,451,420 byte text, SLZ-0 outperformed ZOO by over 34K and ARC by over
- 40K !!!! Compression times for the SLZ-0 program are considerably longer
- than similar compression times with ZOO however (the code is not optimized
- at all and features exhaustive searching in C which is excruciatingly slow).
- However, using the asm decompression routines from within a program yielded
- decompression at a level of speed about 15 times the speed of ZOO from the
- RAM: disk!!! (Please see Note 2 for more on this claim).
-
- Part of this incredible decompression speed comes from the simplicity
- of the encoding method. SLZ-0 works on a TAG-data format. First, an 8-bit
- TAG is read which describes the next 8 chunks of data to follow. For each
- chunk in the data, a bit in the TAG will be set or cleared. A set bit
- represents compressed data and a cleared bit represents uncompressed data.
- each time a cleared bit is encountered, the unpacker simply copies a byte
- from source to destination and continues on to the next chunk. When a set
- bit is encountered in the tag, the unpacker reads a byte from the source.
- If this byte is zero, the routine exits. Otherwise, a second byte is
- read. The combination of these two bytes yields the location and length
- of the data to copy in the following form (O is offset - L is length):
-
- O11 O10 O9 O8 L3 L2 L1 L0 O7 O6 O5 O4 O3 O2 O1 O0 H=4096
-
- The location of the data is determined by subtracting the offset from the
- current output buffer pointer (as compressed bytes are copied from output
- to output and not input to output). The length is simply the value
- represented plus two.
-
- The compressor and decompression routine allow you to easily modify
- the TAG format by changine #defines/EQU's to allow for a choice of
- History=4096 bytes/MaxString=17 or History=2048 bytes/MaxString=33.
- The normal default (I found it to be generally better on everything but
- bitmapped graphics which have long repeat strings), is a History=4096.
- Changing to History=2048 results in the following CS TAG.
-
- O10 O9 O8 L5 L3 L2 L1 L0 O7 O6 O5 O4 O3 O2 O1 O0 H=2048
-
- Again, the location of the data is determined by subtracting the offset from the
- current output buffer pointer (as compressed bytes are copied from output
- to output and not input to output) and the length is simply the value
- represented plus two. However, if you change the compression, be sure to
- be careful to call the correct decompression routine.
-
- Believe it or not, this simple compression scheme yields compression
- ratios similar to ZOO and decompression speeds that are several times
- faster. A simple asm implementation of the unpacker is provided for use in
- programs which include SLZ-0 compressed data. To include compressed data
- in a file, simply compress it with the compressor and use a directive such
- as INCBIN from the CAPE 68K assembler to include the data. Then call this
- routine from C or ASM with the source and destination pointers. It's as
- easy as folling off a log :)
-
- Here's the asm (highly documented for non-asm programmers).
-
- ---------------------------BEGIN ASM----------------------------------------
- CODE, PUBLIC
-
- ***** VOID __regargs UnPackSLZ(UBYTE *from, UBYTE *to)
- ***** calling from assembly: a0 , a1
-
- ***** This code takes a SLZ packed buffer and unpacks it to
- ***** another buffer. Note: for maximum speed, no checking of any
- ***** sort is performed for boundaries. Also, there is no CRC or
- ***** error correction/detection. It is assumed that your data is
- ***** valid.
-
- ***** Uncomment the following for use with C without __regargs.
- *****
- ***** XDEF _UnPackSLZ
- ***** _UnPackSLZ:
- ***** movea.l 4(SP),a0
- ***** movea.l 8(SP),a1
-
- ***** CS_MASK=(2^(8-CS_LSL))-1
- ***** CS_LSL increases with history size (4->H=4096)
- CS_MASK EQU 15
- CS_LSL EQU 4
-
- XDEF @UnPackSLZ
- @UnPackSLZ:
-
- movem.l d2-d3/a2,-(SP) ; Save Registers
-
- bra.b slz_start ; Skip to entry point
-
- slz_literal move.b (a0)+,(a1)+ ; Copy 8 byte literal string FAST!
- move.b (a0)+,(a1)+
- move.b (a0)+,(a1)+
- move.b (a0)+,(a1)+
- move.b (a0)+,(a1)+
- move.b (a0)+,(a1)+
- move.b (a0)+,(a1)+
- move.b (a0)+,(a1)+
-
- slz_start move.b (a0)+,d0 ; Load compression TAG
- beq.b slz_literal ; 8-byte literal string?
-
- moveq #7,d1 ; Loop thru 8 bits
- slz_nxtloop add.b d0,d0 ; Set flags for this compression TAG
- bcs.b slz_comp ; If bit is set then compress
- move.b (a0)+,(a1)+ ; Otherwise copy a literal byte
- dbf d1,slz_nxtloop ; Check and loop through 8 iterations
- bra.b slz_start ; Get next TAG
-
- slz_comp moveq #0,d2 ; Clear offset register
- move.b (a0)+,d2 ; Load compression specifier (cs) into d2
- beq.b slz_exit ; If cs is 0, exit (decompression finished)
- moveq #CS_MASK,d3 ; Copy cs into number reg and mask off bits
- and.w d2,d3 ; num = ( cs & CS_MASK ) [+ 2] ; {at least 3}
- lsl.w #CS_LSL,d2 ; Multiply cs_or by (2^CS_LSL)
- move.b (a0)+,d2 ; and replace lsb with rest of cs
- movea.l a1,a2 ; Now compute the offset from the current
- suba.w d2,a2 ; output pointer (d2 auto-extends)
-
- add.w d3,d3 ; Compute the unroll offset and begin
- neg.w d3 ; unrolled compressed data expansion
- jmp slz_unroll(pc,d3.w)
-
- slz_exit movem.l (SP)+,d2-d3/a2 ; Restore Registers
- rts ; EXIT routine
-
- ****
- **** The following is an unrolled loop for compressed data copying.
- **** The first part of the loop (lines ;33 - ;18) are not necessary
- **** unless you have specified a history of size 2048 and made the
- **** apropriate changes to SLZ.c as well.
- ****
-
- **** For when H=2048
- move.b (a2)+,(a1)+ ; 33
- move.b (a2)+,(a1)+ ; 32
- move.b (a2)+,(a1)+ ; 31
- move.b (a2)+,(a1)+ ; 30
- move.b (a2)+,(a1)+ ; 29
- move.b (a2)+,(a1)+ ; 28
- move.b (a2)+,(a1)+ ; 27
- move.b (a2)+,(a1)+ ; 26
- move.b (a2)+,(a1)+ ; 25
- move.b (a2)+,(a1)+ ; 24
- move.b (a2)+,(a1)+ ; 23
- move.b (a2)+,(a1)+ ; 22
- move.b (a2)+,(a1)+ ; 21
- move.b (a2)+,(a1)+ ; 20
- move.b (a2)+,(a1)+ ; 19
- move.b (a2)+,(a1)+ ; 18
-
- **** For when H=4096 do not include the above (18-35)
- move.b (a2)+,(a1)+ ; 17
- move.b (a2)+,(a1)+ ; 16
- move.b (a2)+,(a1)+ ; 15
- move.b (a2)+,(a1)+ ; 14
- move.b (a2)+,(a1)+ ; 13
- move.b (a2)+,(a1)+ ; 12
- move.b (a2)+,(a1)+ ; 11
- move.b (a2)+,(a1)+ ; 10
- move.b (a2)+,(a1)+ ; 9
- move.b (a2)+,(a1)+ ; 8
- move.b (a2)+,(a1)+ ; 7
- move.b (a2)+,(a1)+ ; 6
- move.b (a2)+,(a1)+ ; 5
- move.b (a2)+,(a1)+ ; 4
- move.b (a2)+,(a1)+ ; 3
- slz_unroll move.b (a2)+,(a1)+ ; 2
- move.b (a2)+,(a1)+ ; 1
-
- dbf d1,slz_nxtloop ; Check and loop through 8 iterations
- bra.s slz_start ; Process Next TAG
-
- END
- ---------------------------END ASM------------------------------------------
-
- The compressor/decompressor program in C is mainly a utility to compress
- data for use by this high speed assembly unpacker. In general the program
- applies a brute force method for matching strings within the history and
- provides extremely slow compression. Decompression, however, is rather
- fast even within the C environment and using a file system.
-
- Even though the exhaustive searching is what leads to compression on
- par with ZOO (often better), it's painful in C. However, I didn't want to
- write this in asm as the code is currently very portable from machine to
- machine and it's much harder for intermediate programmers to read asm.
-
- I have made some improvements to the C utility to improve compression
- speeds though. By defining the symbol "hashing_on", the program will
- allocate a 256K HASH table that is used for quickly matching strings and
- performing up to 16 collision checks based on the first two bytes of
- the current string. After these checks have been made, the program resorts
- to exhaustively scanning the rest of the history buffer. However, the
- hash table does more than double the speed of the compression. Even with
- the hashing on, compression can take a while but you'll be rewarded with
- the high speed at which your data can be decompressed.
-
- The compressor works quite simply. First, it examines the string in
- the input file that is currently being pointed to. If the hashing option is
- present, the program checks hashed pointers for matching strings. It then
- searches the remainder of the history buffer exhaustively for matches of
- length greater than two. When it finds a string longer than the current
- best match, the current best match is updated. If a string of the maximum
- compressed string length is found, the searching is halted. As long as
- a string of length greater than two is found, the compressor sets a bit
- in the TAG and creates a CS specifier which contains data with the offset
- and length of the compressed sctring for the output file. Otherwise, no
- bit in the TAG is set and a single literal byte is copied to the output.
- The program continues until the input is exhausted. When the input is
- exhausted, the program sets a bit in the TAG and writes a zero (byte) to the
- output signifying it has finished compression.
-
- An interesting aspect of this compression technique is how much data
- remains "intact" from input file to output file. When looking at the output
- of most compression programs, a human will see nothing but jibberish. The
- output of a text file compressed with SLZ will still look partially
- intelligible to a human and can be hand-decoded much more easily.
-
- The decompression is simply a C version of the asm decompressor (the
- asm was actually written first) that is file oriented in it's output. It
- is robust but lacks optimization. This is OK though because it is still
- rather fast (extremely fast compared to compression).
-
- If anyone writes a faster version of this program (very easy to do since
- I have made few optimizations and the program is more "robust" in error
- handling than "efficient" in speed), please let me know. I plan on writing
- a faster version myself when I have the time but finding the time to write
- a ~40K article is hard enough.
-
- The program is used with the following syntax: SLZ (u or p) arg_src arg_dest
-
- SLZ FLAG SOURCE DESTINATION
- FLAG can be U - unpack P - pack
-
-
- ---------------------------BEGIN C------------------------------------------
- ;/* SLZ.c - SLZ (un)packer -- Source for Lattice/SAS C version 5.10
-
- LC -rr -b1 -ccist -v -O -L SLZ.c
- quit
-
- */
-
- /*****************************************************************************
- SLZ - SLZ (un)packer. C. 1993 SilverFox SoftWare.
- Written by Adisak Pochanayon All Rights Reserved.
-
- This program is a simple example of using SLZ compression
- upon a file. It will allow you to both pack and unpack a file
- using SLZ compression. The syntax for calling this program from
- the CLI is:
-
- SLZ FLAG SOURCE DESTINATION
- FLAG can be U - unpack P - pack
-
- *****************************************************************************/
-
- // Comment out the following line to save 256K of memory but can slow down
- // compression (immensely :( ).
- #define hashing_on
-
- #include <stdio.h>
- #include <stdlib.h>
-
- /* defines for interpreting the CLI arguments */
- #define arg_flag 1
- #define arg_srce 2
- #define arg_dest 3
- #define arg_count 4
-
- /* defines for handling argument and file errors */
- #define slz_error 1024
- #define error_read 0
- #define error_args 1
- #define error_srce 2
- #define error_dest 4
- #define error_write 8
-
- /* defines for file compression / decompression */
- #define FILE_ERROR 0
- #define TRUE 1
- #define FALSE 0
-
- /*** GLOBAL VARIABLE FOR ERRORS AND FILES ***/
- int error=0;
- long length;
- short lzhist_offset=0;
-
- /*
- * SHIFT_UPPER is amount to multiply upper in one byte to get into next
- * higher byte. (H=4096 -> 16, H=2048 -> 8)
- * LSR_upper is amount to shift codeword to get upper byte into lower byte.
- * (H=4096 -> 4, H=2048 -> 3)
- * MAX_COMP_LENGTH = (2 ^ (8-LSR_upper)) + 1
- */
-
- #define HISTORY_SIZE 4096
- #define MASK_history (HISTORY_SIZE-1)
- #define MASK_upper (0xF0)
- #define MASK_lower (0x0F)
- #define SHIFT_UPPER 16
- #define LSR_upper 4
- #define MAX_COMP_LEN 17
-
- unsigned char LZ_history[HISTORY_SIZE];
- #ifdef hashing_on
- long far HASH_TABLE[65536];
- #define CHASH(a,b,c) ( ((((a&63)<<6)+(b&63))<<4)+c )
- #endif
-
- void writechar(register unsigned char outchar, register FILE *outfile)
- {
- if (putc(outchar,outfile)==EOF)
- {
- printf("Error writing output file.\n");
- fcloseall();
- exit(slz_error + error_write);
- }
- LZ_history[lzhist_offset]=outchar; lzhist_offset=(lzhist_offset+1)&MASK_history;
- }
-
- void UnPackSLZ(unsigned char *inbuffer, register FILE *outfile)
- {
- short myTAG, mycount, myoffset;
- long int loop1;
-
- for(;;) // loop forever (until goto occurs to break out of loop)
- {
- myTAG=*inbuffer++;
- for(loop1=0;(loop1!=8);loop1++)
- {
- if(myTAG&0x80)
- {
- if((mycount=*inbuffer++)==0) // Check EXIT
- { goto skip2; } // goto's are gross but it's efficient :(
- else
- {
- myoffset=HISTORY_SIZE-(((MASK_upper&mycount)*SHIFT_UPPER)+(*inbuffer++));
- mycount&=MASK_lower;
- mycount+=2;
- while(mycount!=0)
- {
- writechar(LZ_history[(lzhist_offset+myoffset)&MASK_history],outfile);
- mycount--;
- }
- }
- }
- else
- { writechar(*inbuffer++,outfile); }
- myTAG+=myTAG;
- }
- }
- skip2:
- return;
- }
-
- #ifdef hashing_on
- void ADDHASH(long a,long b, long c)
- {
- long *INHASH,*BEFINHASH;
- INHASH=&HASH_TABLE[CHASH(a,b,15)];
- BEFINHASH=INHASH-1;
- *(INHASH--)=*(BEFINHASH--); *(INHASH--)=*(BEFINHASH--); *(INHASH--)=*(BEFINHASH--);
- *(INHASH--)=*(BEFINHASH--); *(INHASH--)=*(BEFINHASH--); *(INHASH--)=*(BEFINHASH--);
- *(INHASH--)=*(BEFINHASH--); *(INHASH--)=*(BEFINHASH--); *(INHASH--)=*(BEFINHASH--);
- *(INHASH--)=*(BEFINHASH--); *(INHASH--)=*(BEFINHASH--); *(INHASH--)=*(BEFINHASH--);
- *(INHASH--)=*(BEFINHASH--); *(INHASH--)=*(BEFINHASH--); *(INHASH)=*(BEFINHASH);
- *BEFINHASH=c;
- }
- #endif
-
- void PackSLZ(unsigned char *inbuffer, register FILE *outfile)
- {
- long int loop1,loop2, startvalue, endvalue, wherefound;
- unsigned char outchars[MAX_COMP_LEN*10];
- unsigned char *a_buffer, *b_buffer;
- short lenoutchars;
- short myTAG, iter;
- short done,found,temp;
-
- #ifdef hashing_on
- long *INHASH;
- short HASHCHECK;
-
- for(loop1=0;loop1!=65536;loop1++)
- {
- HASH_TABLE[loop1]=-1;
- }
- HASH_TABLE[CHASH(inbuffer[0],inbuffer[1],0)]=0;
- #endif
-
- loop1=1; loop2=length-1; iter=1; // Always do a literal first byte
- outchars[1]=inbuffer[0]; lenoutchars=2;
- done=FALSE;
-
- while(! done)
- {
- myTAG=0;
- while((iter<8)&&(! done))
- {
-
- /***--- This is the slowest part. I should rewrite it in ASM ---***/
-
- startvalue=loop1-MASK_history;
- if (startvalue<0) startvalue=0;
- found=0;
-
- #ifdef hashing_on
- INHASH=&HASH_TABLE[CHASH(inbuffer[loop1],inbuffer[loop1+1],0)];
- HASHCHECK=16; // Scan for matches in HASH TABLE
- while((HASHCHECK)&&((endvalue=*INHASH++)>=startvalue))
- {
- temp=0;
- a_buffer=&inbuffer[loop1];
- b_buffer=&inbuffer[endvalue];
- while((*(a_buffer++) == *(b_buffer++))&&(temp<MAX_COMP_LEN))
- {
- temp++;
- }
- if(found<temp)
- {
- if ((temp+loop1)>length)
- {
- temp=length-loop1;
- if(found<temp)
- {
- found=temp;
- wherefound=endvalue;
- }
- }
- else
- {
- found=temp;
- wherefound=endvalue;
- }
- if (found==MAX_COMP_LEN) { goto skip1; }
- }
- HASHCHECK--;
- }
- #else
- endvalue=loop1-1;
- #endif
-
- while(endvalue>=startvalue) // Scan for matches
- {
- temp=0;
- a_buffer=&inbuffer[loop1];
- b_buffer=&inbuffer[endvalue];
- while((*(a_buffer++) == *(b_buffer++))&&(temp<MAX_COMP_LEN))
- {
- temp++;
- }
- if(found<temp)
- {
- if ((temp+loop1)>length)
- {
- temp=length-loop1;
- if(found<temp)
- {
- found=temp;
- wherefound=endvalue;
- }
- }
- else
- {
- found=temp;
- wherefound=endvalue;
- }
- if (found==MAX_COMP_LEN) { goto skip1; }
- }
- endvalue--;
- }
- skip1:
-
- /***--- End of *SLOW* exhaustive history scan :( WHEW! ---***/
-
-
- if(found>2) // Compression will now occur
- {
- myTAG|=(128>>iter);
- endvalue=loop1-wherefound;
- startvalue=((endvalue>>LSR_upper)&MASK_upper)+(found-2);
- outchars[lenoutchars++]=startvalue;
- outchars[lenoutchars++]=(unsigned char)endvalue;
- loop2-=found;
- while(found!=0)
- {
- #ifdef hashing_on
- ADDHASH(inbuffer[loop1],inbuffer[loop1+1],loop1);
- #endif
- loop1++; found--;
- }
- }
- else // No Compression, copy literal byte
- {
- #ifdef hashing_on
- ADDHASH(inbuffer[loop1],inbuffer[loop1+1],loop1);
- #endif
- outchars[lenoutchars++]=inbuffer[loop1++];
- loop2--;
- }
- if(loop2<=0)
- {
- done=TRUE;
- if(loop2<0)
- {
- printf("HMMM... ooops!!!\n");
- }
- }
- iter++;
- }
- if(! done)
- {
- outchars[0]=myTAG;
- for(startvalue=0;startvalue!=lenoutchars;startvalue++)
- {
- // writechar(outchars[startvalue],outfile);
- /*****/
- if (putc(outchars[startvalue],outfile)==EOF)
- {
- printf("Error writing output file.\n");
- fcloseall();
- exit(slz_error + error_write);
- }
- /*****/
-
- }
- lenoutchars=1;
- iter=0;
- }
- }
- if(iter==8)
- {
- outchars[0]=myTAG;
- for(startvalue=0;startvalue!=lenoutchars;startvalue++)
- {
- writechar(outchars[startvalue],outfile);
- }
- writechar(255,outfile); writechar(0,outfile);
- }
- else
- {
- outchars[0]=myTAG|(128>>iter);
- for(startvalue=0;startvalue!=lenoutchars;startvalue++)
- {
- writechar(outchars[startvalue],outfile);
- }
- writechar(0,outfile);
- }
- }
-
- main(int argc, char *argv[])
- {
- FILE *infile, *outfile;
- unsigned char *inbuffer;
-
- printf(" SLZ -- SLZ (un)packer. C. 1993 SilverFox SoftWare.\n");
- printf(" Written by Adisak Pochanayon All Rights Reserved.\n\n");
-
- if (argc != arg_count)
- {
- error= slz_error + error_args;
- printf("Error -- Please check arguments.\n");
- }
- else if ((infile = fopen(argv[arg_srce],"r"))==FILE_ERROR)
- {
- error= slz_error + error_srce;
- printf("Error -- SOURCE file could not be opened.\n");
- }
- else if ((outfile = fopen(argv[arg_dest],"w"))==FILE_ERROR)
- {
- error= slz_error + error_dest;
- printf("Error -- DESTINATION file could not be opened.\n");
- }
- else
- {
- /*
- * Get the length of the input file (sorry it takes me 3 calls :( )
- * If you have a better way, let me know.
- */
- fseek(infile,0,2); length=ftell(infile); rewind(infile);
-
- if (length>2) if ((inbuffer=(unsigned char *)getml(length))!=0)
- {
- fread(inbuffer,1,length,infile); // assume this goes no problem :)
- switch (*argv[arg_flag])
- {
- case 'U' : case 'u' : printf("UnPacking File %s.\n",argv[arg_srce]);
- UnPackSLZ(inbuffer, outfile); break;
- case 'P' : case 'p' : printf("Packing File %s.\n",argv[arg_srce]);
- PackSLZ(inbuffer, outfile); break;
- default : error=slz_error + error_args; printf("Error -- no FLAG selected.\n");
- }
- rlsml(inbuffer,length);
- }
- }
-
- if (error!=0)
- printf("\nSLZ command syntax is: \"SLZ FLAG SOURCE DESTINATION\"\n where FLAG can be U - unpack P - pack.\n\n");
- else
- printf("\nSLZ successfully completed\n");
-
- fcloseall();
- exit(error);
- }
- -----------------------------END C------------------------------------------
-
- I achieved compression to 44.07% by applying SLZ to this text file.
- This is better than ZOO by almost 1,000 bytes!
- (0%=impossibly good - 100+%=really bad).
-
- -----------------------------------------------------------------------
-
- As usual, I invite any questions or comments. Please feel free
- to e-mail me at pochanay@cae.wisc.edu for more indepth discussions.
- I am still looking for Amiga Artists to help with new games and
- programs I am planning on writing. (Music, graphics, etc).
-
- Also, if you're a commercial developer interested in exploiting
- my programming skills..... :)
-
- Adisak Pochanayon
- SilverFox SoftWare
- pochanay@cae.wisc.edu
-